811,981 research outputs found

    Sound and Image

    Get PDF
    We hear sounds, and their sources, and their audible qualities. Sounds and their sources are essentially dynamic entities, not wholly present at any given moment, but unfolding through their temporal interval. Sounds and their sources, essentially dynamic entities, are the bearers or susbtrata of audible qualities. Audible qualities are qualities essentially sustained by activity. The only bearers of audible qualities present in auditory experience are essentially dynamic entities. Bodies are not, in this sense, essentially dynamic entities and so are not present in our auditory experience. Though absent in auditory experience, we may, nonetheless, attend to bodies in audition, when an audible sound-generating event in which they participate presents a dynamic aural image of them

    Local Visual Microphones: Improved Sound Extraction from Silent Video

    Full text link
    Sound waves cause small vibrations in nearby objects. A few techniques exist in the literature that can extract sound from video. In this paper we study local vibration patterns at different image locations. We show that different locations in the image vibrate differently. We carefully aggregate local vibrations and produce a sound quality that improves state-of-the-art. We show that local vibrations could have a time delay because sound waves take time to travel through the air. We use this phenomenon to estimate sound direction. We also present a novel algorithm that speeds up sound extraction by two to three orders of magnitude and reaches real-time performance in a 20KHz video.Comment: Accepted to BMVC 201

    Investigation on the Phantom Image Elevation Effect

    Get PDF
    Listening tests have been carried out in order to evaluate the phantom image elevation effect depending on horizontal stereophonic base angle. Seven ecologically valid sound sources as well as four noise sources were tested. Subjects judged the perceived image positions of phantom centre image created with seven loudspeaker base angles. Results generally showed that perceived images were elevated from front to above as the loudspeaker base angle increased up to around 180°. This tendency depended on the spectral characteristics of sound source. The perceived results are explained from both physical and cognitive points of view

    Self-Supervised Audio-Visual Co-Segmentation

    Full text link
    Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data. In this paper we develop a neural network model for visual object segmentation and sound source separation that learns from natural videos through self-supervision. The model is an extension of recently proposed work that maps image pixels to sounds. Here, we introduce a learning approach to disentangle concepts in the neural networks, and assign semantic categories to network feature channels to enable independent image segmentation and sound source separation after audio-visual training on videos. Our evaluations show that the disentangled model outperforms several baselines in semantic segmentation and sound source separation.Comment: Accepted to ICASSP 201

    Robust Sound Event Classification using Deep Neural Networks

    Get PDF
    The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques

    Time-resolved quantitative multiphase interferometric imaging of a highly focused ultrasound pulse

    Get PDF
    Interferometric imaging is a well established method to image phase objects by mixing the image wavefront with a reference one on a CCD camera. It has also been applied to fast transient phenomena, mostly through the analysis of single interferograms. It is shown that for repetitive phenomena multiphase acquisition brings significant advantages. A 1 MHz focused sound field emitted by a hemispherical piezotransducer in water is imaged as an example. Quantitative image analysis provides high resolution sound field profiles. Pressure at focus determined by this method agrees with measurements from a fiber-optic probe hydrophone. This confirms that multiphase interferometric imaging can indeed provide quantitative measurements

    Emotion resonance and divergence: a semiotic analysis of music and sound in 'The Lost Thing', an animated short film and 'Elizabeth' a film trailer

    Get PDF
    Music and sound contributions of interpersonal meaning to film narratives may be different from or similar to meanings made by language and image, and dynamic interactions between several modalities may generate new story messages. Such interpretive potentials of music and voice sound in motion pictures are rarely considered in social semiotic investigations of intermodality. This paper therefore shares two semiotic studies of distinct and combined music, English speech and image systems in an animated short film and a promotional filmtrailer. The paper considers the impact of music and voice sound on interpretations of film narrative meanings. A music system relevant to the analysis of filmic emotion is proposed. Examples show how music and intonation contribute meaning to lexical, visual and gestural elements of the cinematic spaces. Also described are relations of divergence and resonance between emotion types in various couplings of music, intonation, words and images across story phases. The research is relevant to educational knowledge about sound, and semiotic studies of multimodality

    Sound, Image, Silence

    Get PDF
    A visionary new approach to the Americas during the age of colonization, made by engaging with the aural aspects of supposedly “silent” images Colonial depictions of the North and South American landscape and its indigenous inhabitants fundamentally transformed the European imagination—but how did those images reach Europe, and how did they make their impact? In Sound, Image, Silence, noted art historian Michael Gaudio provides a groundbreaking examination of the colonial Americas by exploring the special role that aural imagination played in visible representations of the New World.Considering a diverse body of images that cover four hundred years of Atlantic history, Sound, Image, Silence addresses an important need within art history: to give hearing its due as a sense that can inform our understanding of images. Gaudio locates the noise of the pagan dance, the discord of battle, the din of revivalist religion, and the sublime sounds of nature in the Americas, such as lightning, thunder, and the waterfall. He invites readers to listen to visual media that seem deceptively couched in silence, offering bold new ideas on how art historians can engage with sound in inherently “mute” media.Sound, Image, Silence includes readings of Brazilian landscapes by the Dutch painter Frans Post, a London portrait of Benjamin Franklin, Thomas Edison’s early Kinetoscope film Sioux Ghost Dance, and the work of Thomas Cole, founder of the Hudson River School of American landscape painting. It masterfully fuses a diversity of work across vast social, cultural, and spatial distances, giving us both a new way of understanding sound in art and a powerful new vision of the New World
    corecore